Two-dimensional cache-oblivious sparse matrix-vector multiplication
نویسندگان
چکیده
In earlier work, we presented a one-dimensional cache-oblivious sparse matrix–vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. In this paper, we present our research in this direction, extending the one-dimensional method for cache-oblivious SpMV multiplication to two dimensions, while still allowing only row and column permutations on the sparse input matrix. This extension may require a generalisation of the Compressed Row Storage data structure, to a block-based data structure. Experiments performed on three different architectures show further improvements of the two-dimensional method compared to the one-dimensional method, especially in those cases where the one-dimensional method already provided significant gains. The largest gain obtained by our new reordering is over a factor of 3 in SpMV speed, compared to the natural matrix ordering.
منابع مشابه
Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods
In this article, we introduce a cache-oblivious method for sparse matrix vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behaviour during sparse matrix vector multiplication. Matrices are assumed to be stored in row-major format, by means ...
متن کاملA cache-oblivious sparse matrix–vector multiplication scheme based on the Hilbert curve
The sparse matrix–vector (SpMV) multiplication is an important kernel in many applications. When the sparse matrix used is unstructured, however, standard SpMV multiplication implementations typically are inefficient in terms of cache usage, sometimes working at only a fraction of peak performance. Cache-aware algorithms take information on specifics of the cache architecture as a parameter to ...
متن کاملCache Oblivious Dense and Sparse Matrix Multiplication Based on Peano Curves
Cache oblivious algorithms are designed to benefit from any existing cache hierarchy—regardless of cache size or architecture. In matrix computations, cache oblivious approaches are usually obtained from block-recursive approaches. In this article, we extend an existing cache oblivious approach for matrix operations, which is based on Peano space-filling curves, for multiplication of sparse and...
متن کاملA Geometric Approach to Matrix Ordering
We present a recursive way to partition hypergraphs which creates and exploits hypergraph geometry and is suitable for many-core parallel architectures. Such partitionings are then used to bring sparse matrices in a recursive Bordered Block Diagonal form (for processor-oblivious parallel LU decomposition) or recursive Separated Block Diagonal form (for cache-oblivious sparse matrix–vector multi...
متن کاملSparse matrix multiplication: The distributed block-compressed sparse row library
Efficient parallel multiplication of sparse matrices is key to enabling many large-scale calculations. This article presents the DBCSR (Distributed Block Compressed Sparse Row) library for scalable sparse matrix-matrix multiplication and its use in the CP2K program for linear-scaling quantum-chemical calculations. The library combines several approaches to implement sparse matrix multiplication...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 37 شماره
صفحات -
تاریخ انتشار 2011